title level 1 ## title level 2..

bold italics

R markdown is known as literate programming- mroe elaborate way to write everything down

  1. load packages
  2. source(x.r) – here, bring in functions.r which has other functions in it this way, you can load in all functions instead of loading them later. You can store all functions in this file and call at the beginging. source is also a useful if you have many functions that overlap for different projects so you can have all the functions in one place and call them all in different locations so you dont loose them
library(tidyverse)
## Loading tidyverse: ggplot2
## Loading tidyverse: tibble
## Loading tidyverse: tidyr
## Loading tidyverse: readr
## Loading tidyverse: purrr
## Loading tidyverse: dplyr
## Conflicts with tidy packages ----------------------------------------------
## filter(): dplyr, stats
## lag():    dplyr, stats
library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
source("functions.r")

Download Data

Downloaded a file into R

download.file("https://raw.githubusercontent.com/swcarpentry/r-novice-gapminder/gh-pages/_episodes_rmd/data/gapminder-FiveYearData.csv", destfile = "data/gapminder-FiveYearData.csv")

Read and Display Data

gapminder<- read.csv("data/gapminder-FiveYearData.csv")
head(gapminder)
##       country year      pop continent lifeExp gdpPercap
## 1 Afghanistan 1952  8425333      Asia  28.801  779.4453
## 2 Afghanistan 1957  9240934      Asia  30.332  820.8530
## 3 Afghanistan 1962 10267083      Asia  31.997  853.1007
## 4 Afghanistan 1967 11537966      Asia  34.020  836.1971
## 5 Afghanistan 1972 13079460      Asia  36.088  739.9811
## 6 Afghanistan 1977 14880372      Asia  38.438  786.1134

Plot Data

what is the life expectancy of those years

p <- ggplot(data=gapminder,aes(x=year,y=lifeExp)) +
    geom_point()

p

let’s make it interactive

ggplotly(p)
## We recommend that you use the dev version of ggplot2 with `ggplotly()`
## Install it with: `devtools::install_github('hadley/ggplot2')`

Making your own functions

If you are repeating yourself in your code, you may be able to solve that problem by making your own function!

newfunctionname<- function(argument1, argument2){ arg1+arg2 }

example here is standard error – sd= standard dev sqrt= suqare root length- gives the sameple size bc counts how many are in the sample

on the right, there will be a new section with functions that will display se (above, we have loaded in an R script with functions in it so dont need to load them everytime separately- kind of like installing a package, but it will show up in the environment) roxygen package (look into?)

se<- function(x){
  sd(x)/sqrt(length(x))
}

try it on data: 1. make a data set 2. se(dataset)

cars<- c(3,4,5,6,7,10)

se(cars)
## [1] 1.013794

Data manipulation with dplyr

You will likely want to get subsections of your dataframe and/or calculate means of a variable for a certain subsection, dplyr is your friend! – (put into single quotes for a function in R) learn to select columns from a dataframe with a-d columns : select(data.frame, a,c)

can also exlude a subsection of data select(data.frame, -a,-c)

look at the names by names() or row.names()

gapminder <- read.csv("data/gapminder-FiveYearData.csv")
year_country_gdp <- select(gapminder, year, country, gdpPercap)
year_country_gdp<- select(gapminder,-pop, -continent, -lifeExp)
names(year_country_gdp)
## [1] "country"   "year"      "gdpPercap"

then we want to filter this is the same as select but for rows can use logical vectors as arguments can use pipes to filter %>% this is a pipe its saying i want all of the before to be the first argument of the filter (so dont need to retype in the filter section– dont need gapminder$continent if you filter this way, also building layers)

year_country_gdp_euro <- select(gapminder, year, country, gdpPercap) %>% filter(continent==“Europe”)

–use ctrl+shift+M for a shortcut to %>%

the above wont work bc removed continent but hten looking for continent

add a period in the first argument location bc we already specified part of the dataframe (which is always the first argument, so we essentially leave it blank)–> select(.,year, country, gdpPercap)

year_country_gdp_euro <- gapminder %>% 
  filter(continent=="Europe") %>%
  select(.,year, country, gdpPercap)

the eqivalent without pipes would be: euro<- filter (gapminder, continent=“Europe”) year_country_gdp_euro<- select (euro,year,country,gdpPercap)

need to create an intermediate function of “euro” - order is very important reason to use pipes is not to rewrite the files over and over

exploring the amaxzing ‘group_by’ and ‘summarize’ functions

groupby– alows you to take one big dataframe and separate them and do functions separately do suboperattions separately with summarize use group_by summarize together

summarize- will make a new column mean_gdp- new name of the column =mean(gdpPercap)- the value within this new column

add a new column with standard error of this gdp

mean_gdp_percountry<- gapminder %>%
  group_by(country) %>% 
  summarize(mean_gdp=mean(gdpPercap), se_gdp=se(gdpPercap))

mean_gdp_percountry
## # A tibble: 142 x 3
##        country   mean_gdp     se_gdp
##         <fctr>      <dbl>      <dbl>
##  1 Afghanistan   802.6746   31.23550
##  2     Albania  3255.3666  344.20223
##  3     Algeria  4426.0260  378.26190
##  4      Angola  3607.1005  336.56641
##  5   Argentina  8955.5538  537.68144
##  6   Australia 19980.5956 2256.11315
##  7     Austria 20411.9163 2787.23968
##  8     Bahrain 18077.6639 1563.29518
##  9  Bangladesh   817.5588   67.86165
## 10     Belgium 19900.7581 2422.32683
## # ... with 132 more rows

task: get mean, se, and sample size for lifeExp by continent

can get the sample size 2 ways: length(continent) or n() (can be blank inside, built in function to ‘diplyr’)

mean_life_percontinent<- gapminder %>%
  group_by(continent) %>% 
  summarize(mean_life_expectancy=mean(lifeExp), se_life_expectancy=se(lifeExp), sample_size=n())

mean_life_percontinent
## # A tibble: 5 x 4
##   continent mean_life_expectancy se_life_expectancy sample_size
##      <fctr>                <dbl>              <dbl>       <int>
## 1    Africa             48.86533          0.3663016         624
## 2  Americas             64.65874          0.5395389         300
## 3      Asia             60.06490          0.5962151         396
## 4    Europe             71.90369          0.2863536         360
## 5   Oceania             74.32621          0.7747759          24

can group by multiple things here, added by country to continent

mean_life_percontinent<- gapminder %>%
  group_by(continent,country) %>% 
  summarize(mean_life_expectancy=mean(lifeExp), se_life_expectancy=se(lifeExp), sample_size=n())

mean_life_percontinent
## # A tibble: 142 x 5
## # Groups:   continent [?]
##    continent                  country mean_life_expectancy
##       <fctr>                   <fctr>                <dbl>
##  1    Africa                  Algeria             59.03017
##  2    Africa                   Angola             37.88350
##  3    Africa                    Benin             48.77992
##  4    Africa                 Botswana             54.59750
##  5    Africa             Burkina Faso             44.69400
##  6    Africa                  Burundi             44.81733
##  7    Africa                 Cameroon             48.12850
##  8    Africa Central African Republic             43.86692
##  9    Africa                     Chad             46.77358
## 10    Africa                  Comoros             52.38175
## # ... with 132 more rows, and 2 more variables: se_life_expectancy <dbl>,
## #   sample_size <int>

combine diplyr with ggplot (select is rows, filter is column) the pipes move it to ggplot, whatevercame before the pipe is moved later

euro_countries <- gapminder %>% 
  filter(continent== "Europe") %>% 
  ggplot(aes(x=year,y=lifeExp, color=country)) +geom_line() + facet_wrap(~country)

euro_countries

Data manipulation with tidyr

R likes to have ‘long’ format data where every row is an observation and you have a single column for ‘observations’ the others serve to identify that observation. (exceptions apply when you have multiple types of observations) To switch back and forth from ‘wide’ (how we typically enter data in a spreadsheet) to ‘long’ use tidyr